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Consider the following network communication setup, originating in a sensor networking application we 
refer to as the "sensor reachback" problem. We have a directed graph G = (V, E), where V = {voVi...v n } 
and E C V x V. If (vi,Vj) G E, then node i can send messages to node j over a discrete memoryless 
channel ,Pij(y\x), of capacity Cij. The channels are independent. Each node t>, gets to observe a 
source of information Ui (i = 0...M), with joint distribution p(UqU\...Um). Our goal is to solve an incast 
g ! problem in G: nodes exchange messages with the, neighbors, and after a finUe number of communication 

rounds, one of the M + 1 nodes (vo by convention) must have received enough information to reproduce 

' the entire field of observations {UqU\...Um)> with arbitrarily small probability of error. In this paper, we 

> '. 

i prove that such perfect reconstruction is possible if and only if 



H{U s \U S c) < 



for all S C {0...A/}, S ^ 0, G S c . Our main finding is that in this setup a general source/channel 
! separation theorem holds, and that Shannon information behaves as a classical network flow, identical in 

nature to the flow of water in pipes. At first glance, it might seem surprising that separation holds in a 
fairly general network situation like the one we study. A closer look, however, reveals that the reason for 
$H ' this is that our model allows only for independent point-to-point channels between pairs of nodes, and not 



multiple-access and/or broadcast channels, for which separation is well known not to hold [5, pp. 448-49]. 
This "information as flow" view provides an algorithmic interpretation for our results, among which perhaps 
the most important one is the optimality of implementing codes using a layered protocol stack. 
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I. Introduction 

A. The Sensor Reachback Problem 

Wireless sensor networks made up of small, cheap, and mostly unreliable devices equipped with lim- 
ited sensing, processing and transmission capabilities, have recently sparked a fair amount of interest in 
communications problems involving multiple correlated sources and large-scale wireless networks [6]. It is 
envisioned that an important class of applications for such networks involves a dense deployment of a large 
number of sensors over a fixed area, in which a physical process unfolds — the task of these sensors is then 
to collect measurements, encode them, and relay them to some data collection point where this data is to 
be analyzed, and possibly acted upon. This scenario is illustrated in Fig. 1. 




Fig. 1. A large number of sensors is deployed over a target area. After collecting the data of interest, the sensors must reach back 
and transmit this information to a single receiver (e.g., an overflying plane) for further processing. 

There are several aspects that make this communications problem interesting: 

• Correlated Observations: If we have a large number of nodes sensing a physical process within a 
confined area, it is reasonable to assume that their measurements are correlated. This correlation may 
be exploited for efficient encoding/decoding. 

• Cooperation among Nodes: Before transmitting data to the remote receiver, the sensor nodes may 
establish a conference to exchange information over the wireless medium and increase their efficiency 
or flexibility through cooperation. 

• Channel Interference: If multiple sensor nodes use the wireless medium at the same time (either 
for conferencing or reachback), their signals will necessarily interfere with each other. Consequently, 
reliable communication in a reachback network requires a set of rules that control (or exploit) the 
interference in the wireless medium. 

In order to capture some of these key aspects, while still being able to provide complete results, we make 
some modeling assumptions, discussed next. 
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1) Source Model: We assume that the sources are memoryless, and thus consider only the spatial 
correlation of the observed samples and not their temporal dependence (since the latter dependencies could 
be dealt with by simple extensions of our results to the case of ergodic sources). Furthermore, each sensor 
node Vi observes only one component Ui and must transmit enough information to enable the sink node vq to 
reconstruct the whole vector U\Ui ■ ■ ■ Um- This assumption is the most natural one to make for scenarios in 
which data is required at a remote location for fusion and further processing, but the data capture process is 
distributed, with sensors able to gather local measurements only, and deeply embedded in the environment. 

A conceptually different approach would be to assume that all sensor nodes get to observe independently 
corrupted noisy versions of one and the same source of information U, and it is this source (and not the 
noisy measurements) that needs to be estimated at a remote location. This approach seems better suited for 
applications involving non-homogeneous sensors, where each one of the sensors gets to observe different 
characteristics of the same source (e.g., multispectral imaging), and therefore leads to a conceptually very 
different type of sensing applications from those of interest in this work. Such an approach leads to the so 
called CEO problem studied by Berger, Zhang and Viswanathan in [7]. 

2) Independent Channels: Our motivation to consider a network of independent DMCs is twofold. 
From a pure information-theoretic point of view independent channels are interesting because, as shown 

in this paper, this assumption gives rise to long Markov chains which play a central role in our ability 
to prove the converse part of our coding theorem, and thus obtain conclusive results in terms of capacity. 
Moreover, a corollary of said coding theorem does provide a conclusive answer for a special case of the 
multiple access channel with correlated sources, a problem for which no general converse is known. 

From a more practical point of view, the assumption of independent channels is valid for any network 
that controls interference by means of a reservation-based medium-access control protocol (e.g., TDMA). 
This option seems perfectly reasonable for sensor networking scenarios in which sensors collect data over 
extended periods of time, and must then transmit their accumulated measurements simultaneously. In this 
case, a key assumption in the design of standard random access techniques for multiaccess communication 
breaks down — the fact that individual nodes will transmit with low probability [8, Chapter 4]. As a result, 
classical random access would result in too many collisions and hence low throughput. Alternatively, instead 
of mitigating interference, a medium access control (MAC) protocol could attempt to exploit it, in the form 
of using cooperation among nodes to generate waveforms that add up constructively at the receiver (cf. [9], 
[10], [11]). Providing an information-theoretic analysis of such cooperation mechanisms would be very 
desirable, but since it entails dealing with correlated sources and a general multiple access channel, dealing 
with correlated sources and an array of independent channels constitutes a reasonable first step towards 
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that goal, and is also interesting in its own right, since it provides the ultimate performance limits for an 
important class of sensor networking problems. 

3) Perfect Reconstruction at the Receiver: In our formulation of the sensor reachback problem, the 
far receiver is interested in reconstructing the entire field of sensor measurements with arbitrarily small 
probability of error. This formulation leads us to a natural capacity problem, in the classical sense of Shannon. 
Alternatively, one could relax the condition of perfect reconstruction, and tolerate some distortion in the 
reconstruction of the field of measurements at the far receiver, thus leading to the so called Multiterminal 
Source Coding problem studied by Berger [12]. This condition could be further relaxed, to require a faithful 
reproduction of the image of some function / of the sources, leading to a problem studied extensively by 
Csiszar, Korner and Marton [13], [14]. 

B. An Information Theoretic View of Architectural Issues 

For large-scale, complex systems of the type of interest in this work, the complexity of basic questions 
of design and performance analysis appears daunting: 

• How should nodes cooperate to relay messages to the data collector node vq? Should they decode 
received messages, re-encode them, and forward to other nodes? Should they map channel outputs to 
channel inputs without attempting to decode? Should they do something else? 

• How should redundancy among the sources be exploited? Should we compress the information as 
much as possible? Should we leave some of that redundancy to combat noise in the channels? Is there 
a source/channel separation theorem in these networks? 

• How do we measure performance of these networks, what are appropriate cost metrics? How do we 
design networks that are efficient under an appropriate cost metric? 

In [15], a number of examples are identified in which the existence of a simple architecture has played an 
enabling role in the proliferation of technology: the von Neuman computer architecture, separation of source 
and channel coding in communications, separation of plant and controller in control systems, and the OSI 
layered architecture model. So what all these questions boil down to is an issue similar to those considered 
in [15]: what are appropriate abstractions of the network, similar to the IP protocol stack for the Internet, 
based on which we can break the design task into independent reusable components, optimize the design of 
these components, and obtain an efficient system as a result? In this work, we show how information theory 
is indeed capable of providing very meaningful answers to this problem. 

Information theory, in one of its applications, deals with the analysis of performance of communication 
systems. So, to some it may seem the natural theory to turn to for guidance in the task of searching for 



October 2, 2005. 



DRAFT 



5 



a suitable network architecture. However, to others it may seem unnatural to do so: it is well known that 
information theory and communication networks have not had fruitful interactions in the past, as explained 
by Ephremides and Hajek [16]. Thus, in the presence of these mixed indicators, we take the stand that 
indeed information theory has a great deal to offer in the task at hand. And to justify our position, consider 
Shannon's model for a communications system, as illustrated in Fig. 2. 
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Fig. 2. Shannon's model for a point-to-point system. Top figure: abstract view, consisting of a source, an encoder from 
source symbols to channel symbols, a conditional probability distribution to model the random dependence of outputs 
on inputs, and a decoder to map from received messages back to source symbols; bottom figure: a capacity-achieving 
architecture for this system, in which error control codes are used to create an illusion of a noiseless bit pipe. 



For this setup, Shannon established that reliable communication of a source over a noisy channel is possible 
if and only if the entropy rate of the source is less than the capacity of the channel [5, Ch. 8.13]. This 
result, known as the source/channel separation theorem, has a double significance. On one hand, it provides 
an exact single-letter characterization of conditions under which reliable communication is possible. On the 
other hand, and of particular interest to the task at hand for us, it is a statement about the architecture of an 
optimal communication system: the encoder/decoder design task can be split into the design and optimization 
of two independent components. So it is inspired by Shannon's teachings for point-to-point systems that 
we ask in this work, and answer in the affirmative, the question of whether it is possible or not to derive 
similar useful architectural guidelines for the class of networks under consideration. 

C. Related Work 

The problem of communicating distributed correlated sources over a network of point-to-point links is 
closely related to several classical problems in network information theory. To set the stage for the main 
contributions of this paper, we now review related previous work. 
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7 ) Distributed Correlated Sources and Multiple Access: The concept of separate encoding of correlated 
sources was studied by Slepian and Wolf in their seminal paper [17], where they proved that two correlated 
sources (UiU 2 ) drawn i.i.d. ~ p(u±u 2 ) can be compressed at rates (Ri,R 2 ) if and only if 

Ri > H{U X \U 2 ) 

R2 > H{U 2 \U{) 

R1 + R2 > H{U X U 2 ). 

Assume now that (UiU 2 ) are to be transmitted with arbitrarily small probability of error to a joint 
receiver over a multiple access channel with transition probability p(y\xix 2 ). Knowing that the capacity 
of the multiple access channel with independent sources is given by the convex hull of the set of points 
(Ri,R 2 ) satisfying [5, Ch. 14.3] 

R x < I(X i; Y\X 2 ) 

R 2 < /(X 2 ;F|Xi) 

Ri+R 2 < I(XiX 2 ;Y), 

it is not difficult to prove that Slepian-Wolf source coding of {U\U 2 ) followed by separate channel coding 
yields the following sufficient conditions for reliable communication 

H{Ux\U 2 ) < I(X i; Y\X 2 ) 

H(U 2 \Ui) < I(X 2 ;y|Xi) 

H(U!U 2 ) < I{X X X 2 ;Y). 

These conditions, which basically state that the Slepian-Wolf region and the capacity region of the multiple 
access channel have a non-empty intersection, are sufficient but not necessary for reliable communication, 
as shown by Cover, El Gamal, and Salehi with a simple counterexample in [18]. In that same paper, the 
authors introduce a class of correlated joint source/channel codes, which enables them to increase the region 
of achievable rates to 

H{U X \U 2 ) < I(X 1 ;Y\X 2 U 2 ) (1) 

H{U 2 \U{) < /(X 2 ;F|XiC/i) (2) 

H{U X U 2 ) < I(X X X 2 ;Y), (3) 

for some p(u\u 2 x\x 2 y) = p(u\u 2 ) -p{x\\u\) ■p{x 2 \u 2 ) ■ p(y\xix 2 ). Also in [18], the authors generalize this 
set of sufficient conditions to sources (U\U 2 ) with a common part W = f(Ui) = g(U 2 ), but they were not 
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able to prove a converse, i.e., they were not able to show that their region is indeed the capacity region 
of the multiple access channel with correlated sources. Later, it was shown with a carefully constructed 
example by Dueck in [19] that indeed the region defined by eqns. (l)-(3) is not tight. Related problems 
were considered by Slepian and Wolf [20], and Ahlswede and Han [21]. To this date however, the general 
problem still remains open. 

Assuming independent sources, Willems investigated a cooperative scenario, in which encoders exchange 
messages over conference links of limited capacity prior to transmission over the multiple access channel [22]. 
In this case, the capacity region is given by 

Ri < I(X r ,Y\X 2 Z) + C 12 

R 2 < I(X 2 ;Y\X X Z) + C 2 i 

R 1 +R 2 < mm{I(XiX 2 ;Y\Z) + C 12 + C 21 , I^X^Y) }, 

for some auxiliary random variable Z such that \Z\ < min(| X\ \ ■ \X 2 \ +2, |3^| + 3), and for a joint distribution 
p{zxxx 2 y±y 2 ) = p{z) ■ p{xi\z) -p{x 2 \z) -p(y\xxx 2 ). 

2) Correlated Sources and Networks ofDMCs: Very recently, an early paper was brought to our attention, 
in which Han considers the transmission of correlated sources to a common sink over a network of 
independent channels [23]. Although the problem setup is less general than ours, in that (a) each source 
block and each transmitted codeword partipate only once in the encoding process, and (b) the intermediate 
nodes are assumed to decode the data before passing it on, Theorem 3.1 of [23] is very similar to our 
Theorem 1. 

Our work, done independently of Han's, differs from it and complements it in the following ways: 

• Our setup is more general. We allow for arbitrary forms of joint source-channel coding to take place 
inside the network while data flows towards the decoder, and then prove that a one-step encoding process, 
pure routing, and separate source/channel coding are sufficient. Han assumes decode-and-forward in 
his problem statement, as well as a one-step encoding process. 

• The proof techniques are different. Han takes a purely combinatorial approach to the problem: he 
thoroughly exploits the polymatroidal structure of the capacity function for the network of channels, 
and the co-polymatroidal structure for the Slepian- Wolf region. We establish our achievability result by 
explicitly constructing a routing algorithm for the Slepian- Wolf indices, and our converse by standard 
methods based on Fano's inequality. 

Furthermore our work, being motivated by a concrete sensor networking application, establishes connections 
and relevance to practical engineering problems (see Section ni) that are not a concern in [23]. 
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3) Network Coding: Another closely related problem is the well known network coding problem, intro- 
duced by Ahlswede, Cai, Li and Yeung [24]. In that work, the authors establish the need for applying coding 
operations at intermediate nodes to achieve the max-flow/min-cut bound of a general multicast network. A 
converse proof for this problem was provided by Borade [25]. Linear codes were proposed by Li, Yeung 
and Cai in [26], and Koetter and Medard in [27]. 

Effros, Medard et al. have developed a comprehensive study on separate and joint design of linear source, 
channel and network codes for networks with correlated sources under the assumption that all operations 
are defined over a common finite field [28]. For this particular case, optimality of separate linear source 
and channel coding was observed in the one-receiver instance, but the result of [28] does not prove that 
it holds for general networks and channels with arbitrary input and output alphabets. Error exponents for 
multicasting of correlated sources over a network of noiseless channels were given by Ho, Medard et al. 
in [29], and networks with undirected links were considered by Li and Li in [30]. 

Another problem in which network flow techniques have been found useful is that of finding the maximum 
stable throughput in certain networks. In this problem, posed by Gupta and Kumar in [31], it is sought to 
determine the maximum rate at which nodes can inject bits into a network, while keeping the system stable. 
This problem was reformulated by Peraki and Servetto as a multicommodity flow problem, for which tight 
bounds were obtained using elementary counting techniques [32], [33]. 

D. Main Contributions and Organization of the Paper 

Our main original contributions can be summarized as follows: 

• A general coding theorem yielding necessary and sufficient conditions for reliable communication of 
M + 1 correlated sources to a common sink over a network of independent DMCs. 

• An achievability proof which combines classical coding arguments with network flow methods and a 
converse proof that establishes the optimality of separate source and channel coding. 

• A detailed discussion on the engineering implications of our main result, and the concepts of information- 
theoretically optimal network architectures and protocol stacks. 

The rest of the paper is organized as follows. In Section II we give formal definitions, to then state and 
prove our main theorem. We also look at three special cases: a network with three nodes, the non-cooperative 
case, and an array of orthogonal Gaussian channels. In Section III we address the practical implications of our 
main result, by describing an information-theoretically optimal protocol stack, elaborating on the tractability 
of related network architecture and network optimization problems, and discussing the suboptimality of 
correlated codes for orthogonal channels. The paper concludes with Section IV. 
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II. A Coding Theorem for Network Information Flow with Correlated Sources 
A. Formal Definitions and Statement of the Main Theorem 

A network is modeled as the complete graph on M + 1 nodes. For each (vi, Vj) £ E (0 < i, j < M), there 
is a discrete memoryless channel (Xij,Pij(y\x), y%j), with capacity = max pij .( x ) I(Xij] iy). 1 At each 
node G V, a random variable t/j is observed (i = 0...M), drawn i.i.d. from a known joint distribution 
p{UqU\...Um). Node wo is the decoder - the goal in this problem is to find conditions under which U\...Um 
can be reproduced reliably at vq. We now make this statement more precise, by describing how the nodes 
communicate and by giving the formal definitions of code, probability of error and reliable communication. 

Time is discrete. Every N time steps, node Vi collects a block of source symbols - we refer to the 
collection of all blocks [Uq ' {k)U^ ' (k)...Uj^(k)] collected at time kN (k > 1) as a block of snapshots. Node 
V{ then sends a codeword to node Vj . This codeword depends on a window of K previous blocks of source 
sequences Uf observed at node Vi, and of T previously received blocks of channel outputs, corresponding 
to noisy versions of the codewords sent by all nodes to node V{ in the previous T communications steps 
(corresponding to NT time steps). 

For a block of snapshots observed at time kN, at time (k + W)N (that is, after allowing for a finite 
but otherwise arbitrary amount of time to elapse, 2 in which the information injected by all nodes reaches 
vq), an attempt is made to decode at vq. The decoder produces an estimate of the block of snapshots 
Uq {k)Ui (k)...U^(k) based on the local observations U N (k), and the previous W blocks of ./V channel 
outputs generated by codewords sent to by the other nodes. 

Thus, a code for this network consists of: 

• four integers N, K, T and W; 

• encoding functions at each node 

K T M 

// :(8)^ v x(g)(g)j;L ..i;J. 

1=1 t=l m=0 

for < i,j < M. 

'Note that dj could potentially be zero, thus assuming a complete graph does not mean necessarily that any node can send 
messages to any other node in one hop. 

2 During the time that a block of snapshots spends within the network, arbitrarily complex coding operations are allowed within 
the pipeline: nodes can exchange information, redistribute their load, and in general perform any form of joint source-channel coding 
operations. The only constraint imposed is that all information eventually be delivered to destination, within a finite time horizon. 
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• the decoding function at node vq: 

W M M 
w=l m=l m=l 

• the block probability of error: 

pW=PQ7?...u£ 

We say that blocks of snapshots —Uj^ can be reliably communicated to vq if there exists a sequence 
of codes as above, with P e — > as N — > oo, for some finite values K, T and W, all independent of N. 
With these definitions, we are now ready to state our main theorem. 

Theorem 1: Let S denote a non-empty subset of node indices that does not contain node 0: S C {0...M}, 
S / 0, € S c . Then, it is possible to communicate U\...Um reliably to vq if and only if, for all 5 as above, 

H(U s \U S c) < ]T dj. (4) 

ieS,j€S" 



B. Achievability Proof 

Our coding strategy is based on separate source and channel coding. We first use capacity attaining channel 
codes to turn the noisy network into a network of noiseless links (of capacity CV,). Then, we use Slepian- 
Wolf source codes, jointly with a custom designed routing algorithm, to deliver all this data to destination. 
Since the channel coding aspects of the proof are rather straightforward extensions of classical point-to-point 
arguments, in the following we only focus on the less obvious source coding and routing aspects. 

1) Mechanics of the Coding Strategy: Consider a "noise-free" version of the problem formulated above: 
we still have a complete graph, now with noiseless links of capacity Cy. Variables Ui are still observed at 
each node vi, and the goal remains to reproduce all of these at vq. Each node uses a classical Slepian-Wolf 
code: there is a source encoder at node Vi that maps a sequence Uf to an index from the random binning 
set {1,2,... ,2^}, thus compressing the block of observations Uf using codes as in [5, Thm. 14.4.2]. 
Let (R\...Rm) denote the rate allocation to each of the nodes. To achieve perfect reconstruction, these bits 
must be delivered to node vq. 

• Set K = T = 1 - each block of source symbols and each block of codewords participates in the 
encoding process only once. 

• To deliver the bin indices produced by the Slepian-Wolf codes to destination, the noise-free network 
is regarded as a flow network [34, Ch. 26]. Let <p(vi,Vj) be a feasible flow in this network, with M 
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sources v\...vm, supply Ri at source Vi, and a single sink vq. If no such feasible flow exists, the code 
construction fails. 

• If there is a feasible flow ip then this ip uniquely determines, at each node v^, the number of bits that 
need to be sent to each of its neighbors - thus from ip we derive the encoding functions gij as follows: 
- Consider the directed acyclic graph G' of G induced by ip, by taking V{G') = V{G), and 
E{G') = {(vi,Vj) € E : tp(vi,Vj) > 0}. Define a permutation 7r : {0...M} — > {0...M}, such 
that [v n (Q-)V n (iy..v n (M)] is a topological sort of the nodes in G, as illustrated in Fig. 3. 



/ / 




/ / 



Fig. 3. A topological sort of the nodes of a directed acyclic graph is a linear ordering v\...vm such that if (vi,Vj) is 
an edge, then i < j. 

- Consider a block of snapshots U(fe) = [Uq (k)lff (k)...Uj^(k)] captured at time kN. At time 
(k + l)N (for I = 0...M), node v n m will have received all bits with portions of the encodings of 
V(k) generated by nodes upstream in the topological order - thus, together with its own encoding 
of U^(k), all the bits for TJ(k) up to and including node v n ^ will be available there, and thus 
can be routed to nodes downstream in the topological order. 

- Consider now all edges of the form (v n ^),v') for which ip{v^n.yv') > 0: 

1) Collect the m = Y^, v < l f{ v ',v n ^) information bits sent by the upstream nodes v' . 

2) Consider now the set of all downstream nodes v" , for which <p(v 7r fj.)-> v ") > 0- Due to flow 
conservation for ip, ^2 V „ p(v n ^),v") = m + R n (k)> where R^fty is the rate allocated to node 

3) For each v" as above, define g^) k \ v „ to be a message such that \g^(k)v'' I = ^(^(fc); v ')- Partition 
the m + i? 7r (fe) available bits according to the values of ip, and send them downstream, as 
illustrated in Fig. 4. 

• To decode, at time (k + M)N, node vq does the following: 

- Decode all channel outputs received at time (k + M — 1)N, to recover the bits sent by each 1-hop 
neighbor of the sink. 

- Reassemble the set of bin indices from the segments received from each neighbor. 



October 2, 2005. 



DRAFT 



12 



b x b. 



b 3 b 4 b 5 




Fig. 4. To illustrate the operations performed at each node. In this example, five bits come into node u^fe) from 
neighbouring nodes, two on the top link and three on the bottom link. The information bits from other nodes come 
in the form of noisy codewords - they need to be decoded from the received channel outputs. Now, because flow 
conservation holds for <p, we know that the aggregate capacity of the three output links will be at least five bits plus 
some local bits (the encoding of a block of local observations U^ k y denoted by and 67 here). So at this point we 
split those bits in a way such that the individual capacity constraints of the output links are not violated, and then they 
are sent on their way to Vq- 



- Perform typical set decoding (as in [5, pg. 41 1]), to recover the block of snapshot [U^ (k)...U^{k)}. 
An important observation is that, in this setup, network coding (in the sense of [24]) is not needed. This 
is because we have a case of M sources and a single sink interested in collecting all messages, a case for 
which it was shown in [35] that routing alone suffices. 

Our next task is to find conditions under which this coding strategy results in — > as N — > 00. 

2) Analysis of the Probability of Error: The coding strategy proposed above hinges on two main elements: 

• Slepian-Wolf codes: in this case, we know that provided the rate vector {R\...Rm) is such that, for all 
partitions S of {0...M}, 5 + 0, G S c , 

Y< R i> H(U s \U S c), (5) 

then there exist Slepian-Wolf codes with arbitrarily low probability of error [5, Ch. 14.4]. 

• Network flows: from elementary flow concepts we know that if a flow (p is feasible in a network G, 
then for all S C {0...M}, S + 0, € S c , 

Yj Ri ~ Yl < P( v i> v 3) 

i£S ieS,j£V 



(6) 



iesjeS" 



(c) \ - 

< ^ Cy, (6) 

where (a) and (b) follow from the flow conservation properties of a feasible flow (all the flow injected 
by the sources has to go somewhere in the network, and in particular all of it has to go across a network 
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cut with the destination on the other side); and (c) follows from the fact that in any flow network, the 
capacity of any cut is an upper bound to the value of any flow. 
Thus, from (5) and (6), we conclude that if, for all partitions S as above, we have that 

H(U S \U S *)< Yl C ^ W 

then P e (JV) -> as N -> oo. 
C. Converse Proof 

The converse proof is fairly long and tedious, but by virtue of being based on Fano's inequality and 
standard information-theoretic arguments, it is relatively straightforward - therefore, we omit it here and 
provide the technical details in Appendix A. At this point however, we would like to sketch out an informal 
argument on why this converse should hold. 

Consider an arbitrary network partition S of {0...M}, S ^ 0, E S c . For each such partition we define 
a two-terminal system, with a "supersource" that has access to the whole vector of observations U\...Um, 
and a "supersink" that has access only to Us». The supersource and supersink are connected by an array of 
parallel DMCs: if i E S and j E S c , then (X%j,Pij(y\x),yij) from the network is one of the channels in 
the array. This is illustrated in Fig. 5. 




Fig. 5. An artificial two-terminal system: all sources in S are treated as a supersource, connected to a supersink made 
of all the sinks in S c by an array of DMCs (those going across the cut). Intuitively, any necessary condition for this 
system should also be necessary for our system (although this requires a formal statement and proof). The interesting 
statement thus is to show that the set of all conditions obtained in this form (by considering all possible cuts) is also 
sufficient. 



Clearly, H(Us\Us<=) < J2ies j£S c ^ij is an outer bound for this two-terminal system (follows directly 
from the source/channel separation theorem, [5, Sec. 8.13]). And intuitively, it is also clear that any outer 
bound for this two-terminal system provides necessary conditions for reliable communication to be possible 
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in our network. Thus, by considering all possible partitions (S, S°) as above, we obtain a set of necessary 
conditions matching those of the achievability result. 3 

We would also like to highlight that, because of the correlation between sources, a simple max-flow/rnin- 
cut bounding argument as suggested in [5, Section 14.10]) is not sufficient to establish the source-channel 
separation result we seek - proving said result requires all the steps of a typical converse. 

A formal proof for this converse is provided in Appendix A. 

D. Special Cases 

1) A Network with Three Nodes: To provide an illustration of the meaning of Theorem 1 , and of the 
optimality of the flow-based solution, we specialize Theorem 1 to the case of a network with three nodes. 
In this case, those conditions become: 

H(Ui\U 2 U ) < C w + C 12 (8) 
HiUzlUxUo) < C 20 + C 21 (9) 
ff(C/iC/ 2 |£/o) < C w + C 20 . (10) 
A network with three nodes as considered here is illustrated in Fig. 6. 




Fig. 6. A network with three nodes. 

Next, we regard the network in Fig. 6 as a. flow network [34, Ch. 26]: a flow network with two sources 
(vi and v 2 ) and a single sink (vo). Encodings of U\ injected at source v\ at rate R\, and of U 2 injected 
at v 2 at rate R 2 , are the "objects" that flow in this network and are to be delivered to the sink vq. This is 
illustrated in Fig. 7. 

In the simple flow network of Fig. 7, any feasible flow cp must satisfy some conservation equations: 

Ri = ip{vuv Q ) + (p(vx,v 2 ), 
R 2 = <p(v 2 ,v ) + cp(v2,vi), 
Ri+R 2 = (p(v 1 ,vo) + tp(v 1 ,V2) + ip(v2 } vo) + ip(v2,v 1 ) = <p(vi,v ) + <p(v 2 ,v ), 

3 We thank our Reviewer B, for suggesting this simple and very clear interpretation for the converse. 
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Fig. 7. A flow network with three nodes, supplies Ri and i?2 and nodes v% and V2, and a sink vq. 



where the last equality follows from the fact that flow conservation holds: the total amount of flow injected 
+ R 2 ) must equal the total amount of flow received by the sink ((p(v\,vo) + ip(v 2 , ^o)) [34]. Similarly, 
any feasible flow must also satisfy all capacity constraints: 

<p(vi,V ) + Cp(vx,V2) < C10 + C12, 

<p(v 2 ,v ) + ^(^2,^1) < C20 + C21, 

(p(vi,v ) + f(v 2 ,vo) < Cl0 + C 2 0- 
Combining these last two sets of constraints, and the conditions from the Slepian-Wolf theorem on feasible 
(Ri,R 2 ) pairs, we immediately get 

H{Ux\U 2 U Q ) < Ri < C 10 + Ci2, 

HiUilU&o) < R 2 < Cao + Cai, 

HiU^Uo) < R1 + R2 < C w + C 20 . 
It is interesting to observe in this argument that the region of achievable rates forms a convex polytope, 
in which three of its faces come from the Slepian-Wolf conditions, and three come from the capacity 
constraints. This polytope is illustrated in Fig. 8. This polytope plays a central role in our analysis: reliable 



C20 + C21 



iy(Z7i|Z7 2 Z7 ) 




1Z 



Cm + C20 



HiU^UxUQ) 



C10 + Ci 2 ^1 



Fig. 8. The polytope 1Z of admissible rates. 

communication is possible if and only if 1Z 7^ 0. Thus, the view of "information as a flow" in this class of 
networks is complete. 
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2) No Cooperation and No Side Information at vq: We consider now the special case of M non- 
cooperating nodes and one sink, as illustrated in Fig. 9. Necessary and sufficient conditions for reliable 
communication under this scenario follow naturally from our main theorem by setting Cij = for all j ^ 0, 
and [Wq | = 1. 




Fig. 9. M non-cooperating nodes. 



Corollary 1: The sources U±, U2, ■ ■ ■ , Um can be communicated reliably over an array of independent 
channels of capacity Cm, i = 1. . . M, if and only if 



H(U s \U S c) <J2°i0, 



igS 



for all subsets S C {1, 2, . . . , M}, S + 0. 

An illustration of this corollary for two sources U\ and U2 is shown in Fig. 10. When we have two 




H(Ui\Ui) doHiU!) HiUiU-2) R ± 



H(U!\U 2 ) H{U{) C 10 



Fig. 10. Relationship between the Slepian-Wolf region and the capacity region for two independent channels. In the left figure, as 
H(Ui\U2) < C10 and H(U2\Ui) < C20 the two regions intersect and therefore reliable communication is possible. The figure on 
the right shows the case in which H(U2\Ui) > C20 and there is no intersection between the two regions. 



independent channels with capacities C10 and C20, the capacity region becomes a rectangle with side 
lengths C10 and C20 [5, Chapter 14.3]. Also shown is the Slepian-Wolf region of achievable rates for 
separate encoding of correlated sources. Clearly, H{UiU2) < C10 + C20 is a necessary condition for reliable 
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communication as a consequence of Shannon's joint source and channel coding theorem for point-to-point 
communication. Assuming that this is the case, consider now the following possibilities: 

• H{U\) < Cio and H{U2) < C2o- The Slepian-Wolf region and the capacity region intersect, so any 
point (i?i,i?2) in this intersection makes reliable communication possible. Alternatively, we can argue 
that reliable transmission of U\ and U2 is possible even with independent decoders, therefore a joint 
decoder will also achieve an error-free reconstruction of the source. 

• H{U\) > Cio and HiJJ-z) > C2o- Since H{U\U2) < C\q + C20 there is always at least one point 
of intersection between the Slepian-Wolf region and the capacity region, so reliable communication is 
possible. 

. H(Ui) < C w and H(U 2 ) > C 20 (or vice versa). If H^Ux) < C 20 (or if H{U 1 \U 2 ) < C 10 ) then the 
two regions will intersect. On the other hand, if H(U2\Ui) > C20 (or if H(Ui\U2) > Cio), then there 
are no intersection points, but it is not immediately clear whether reliable communication is possible 
or not (see Fig. 10), since examples are known in which the intersection between the capacity region 
of the multiple access channel and the Slepian-Wolf region of the correlated sources is empty and still 
reliable communication is possible [18]. 
Corollary 1 gives a definite answer to this last question: in the special case of correlated sources and 
independent channels an intersection between the capacity region and the Slepian-Wolf rate regions is not 
only sufficient, but also a necessary condition for reliable communication to be possible — in this case, 
separation holds. 

3 ) Arrays of Gaussian Channels: We should also mention that Theorem 1 applies to other channel models 
that are relevant in practice, for instance Gaussian channels with orthogonal multiple access. For simplicity, 
we illustrate this issue in the context of Corollary 1. The capacity of the Gaussian multiple access channel 
with M independent sources is given by 



for all S C {1...M}, S 7^ 0, and where a 2 and Pj are the noise power and the power of the i-th user 
respectively [5, pp. 378-379]. If we use orthogonal accessing (e.g. TDMA), and assign different time slots 
to each of the transmitters, then the Gaussian multiple access channel is reduced to an array of M independent 
single-user Gaussian channels each with capacity 





1 < i < M, 
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where r^o is the time fraction allocated to source user i to communicate with the data collector node vq, 
and Pjo is the corresponding power allocation. 

Applying Theorem 1 , we obtain the reachback capacity of the Gaussian channel with orthogonal access- 
ing. 4 Then, reliable communication is possible if and only if 



A. An Information Theoretically Optimal Protocol Stack 

We believe that the fact that in networks of point-to-point noisy links with one sink Shannon information 
has the exact same properties of classical network flows is of particular practical relevance. This is so 
because there is a rich algorithmic theory associated with it, which allows us to cast standard information 
theoretic problems into the language of flows and optimization. Perhaps most relevant among these is is the 
optimality of implementing codes using a layered protocol stack, as illustrated in Fig. 1 1 . 

As discussed in the Introduction, the decision to turn a wireless network into a network of point-to-point 
links is an arbitrary one. But, due to complexity and/or economic considerations, this arbitrary decision is 
one made very often, and thus we believe it is of great practical interest to understand what are appropriate 
design criteria for such networks. And our Theorem 1 offers valuable insights in this regard - if we decide 
to define a link-layer based on a MAC protocol that deals with interference by suppressing it, then all 
remaining layers in Fig. 11 follow from the achievability proof of Theorem 1. We see therefore that indeed, 
in this class of networks, Fig. 1 1 provides a set of abstractions analogous to those of Fig. 2 for classical 
two-terminal systems. 

B. Algorithmic/Computational Issues 

As an illustration of the benefits of the "information as flow" interpretation for our results, in this subsection 
we outline some initial results on an optimal routing problem. This topic however will be developed in full 
depth elsewhere. 

4 The generalization of Theorem 1 for channels with real-valued output alphabets can be easily obtained using the techniques 
in [5, Sec. 9.2 & Ch. 10]. 




for all subsets S C {1,2 



...,M}, 5^0. 



III. Practical/Engineering Implications of Theorem 1 
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Fig. 11. Abstractions that follow from the achievability proof, illustrated here for three nodes. At the physical layer 
there are nodes with power constraints, a data field of which these nodes collect samples in space and time, and a 
gateway node that will deliver all this data to destination. On top of this physical substrate, we construct a sequence 
of abstractions: noiseless point-to-point links of a given capacity (the Link Layer); a flow network (the Network 
Layer); a set of connections (the Transport Layer); and a set of distributed signal processing algorithms for sampling, 
compression and interpolation of the space/time continuous process (the Presentation Layer). In the end, an approximate 
representation of the underlying data field is delivered to applications. 

1 ) Optimization Aspects of Protocol Design: A natural question that follows from our previous develop- 
ments is one of optimization: given a non-empty feasibility poly tope 1Z, we have the freedom of choosing 
among multiple assignments of values to flow variables, and thus it is only natural to ask if there is an 
optimal flow. To this end, we define a cost function k as follows: 

(v,,v 3 )eE 
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where c(vi,Vj) is a constant that, multiplied by the total number of bits <p(vi,Vj) that a flow ip assigns to 
an edge (vi,Vj), determines the cost of sending all that information over the channel (Xij,Pij(y\x),yij). 
The resulting optimization problem is shown in Fig. 12. 







subject to; 




Standard How constraints (capacity / skew symmetry / flow conservation) 




^(vi,Vj) < Cij, 


< i,j < M. 


<p(Vi,Vj) = -lf{Vj,Vi), 


< i,j < M. 


T,veV ( P(.Vi,v) = 0, 


1 < i < M. 


Rate admissibility constraints 




H(U s \U S c) < J2ies<P( s ' v i) ^ T.ies,j&sc C ij> 


S C {l...M},S ± 0. 


tp(s,Vi) = Ri, 


1 < i < M. 



Fig. 12. Linear programming formulation for the assignment of values to flow variables (observe the introduction of a 
"supersource" s, which supplies Ri units of flow to Vi). A solution to this problem provides optimal routes (those with 
positive flow assignment) and loads on each link. Note as well that, by choosing c(vi,vj) = for all (vi,vj) G E, 
this LP is solvable if and only if 1Z ^ - that is, the decision problem for reliable communication (i.e., for whether 
a given load p(UqUx...Um) can be carried over a given network G) admits a linear programming formulation too. 

The choice of a linear cost model in this setup can be justified based on a number of reasons. First of 
all, linearity is a very natural assumption: in simple language, it says that it costs twice as much to double 
the amount of information sent on any channel. For example, we could take c(vi,vj) to be the minimum 
energy per information bit required for reliable communication over the DMC from Vi to Vj [36], and then 
K(ip) would give us the sum of the energy consumed by all nodes when transporting data as dictated by 
a particular flow ip. Specifically in the context of routing problems, another important consideration is that 
the main drawback often cited for solving optimal routing problems based on network flow formulations is 
given by the fact that cost functions such as k only optimize average levels of link traffic, ignoring other 
traffic statistics [8, pg. 436]. But this is not at all an issue here, since the values of flow variables (i.e., 
Shannon information) are already average quantities themselves. 

2) A Routing Example: As one example of the usefulness of the LP formulation in Fig. 12, we consider 
next the problem of designing efficient mechanisms for data aggregation, as motivated in [37]. There has 
been a fair amount of work reported in the networking literature, on the design and performance analysis of 
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tree structures for aggregation — for example, the work of Goel and Estrin on the construction of trees that 
perform well simultaneously under multiple concave costs [38]. Based on our LP formulation, we construct 
two examples which show the extent to which trees could give rise to suboptimalities, as opposed to other 
topological structures. And we start by showing an example in which, although 1Z ^ 0, there are no feasible 
trees. This case is illustrated in Fig. 13. 




Fig. 13. To illustrate a solvable problem that cannot be solved using trees. Left: a flow network; middle/right: the decomposition of 
a feasible flow into two single flows, showing how much of the flow injected at each source is sent over which link (x/c next to 
an edge means that the edge carries x units of flow, and has capacity c). 

As illustrated in Fig. 13, a solution to the transport problem exists. However, it is easy to check that if we 
constrain data to flow along trees, none of the three possible trees Vo); (t>2, ^o)}> or {(^l) ^2); {v2, ^o)}> 

or {(i>2, Vi)', (wi,i>o)}) are feasible: in all cases, there is one link for which the capacity constraint is violated. 

Next we consider a case where feasible trees exist, but the lowest cost of any tree differs from the optimal 
cost by an arbitrarily large factor. This case is illustrated in Fig. 14. 




Fig. 14. To illustrate a problem in which trees are very expensive. Left: a flow network with costs; right: an optimal solution to the 
linear program in Fig. 12. Such a case could arise, e.g., in a situation where there is heavy interference in the direct path from vi 

to Vq. 

In this case, there exists only one feasible tree: {(v\,vo); («2j ^o)}> with cost £(l+e)+l. However, because 
of the "expensive" link (v\,vo) along which the tree is forced to send all its data, the cost is significantly 
increased: by splitting the encoding of U\ as illustrated in Fig. 14, the cost incurred into by this structure 
would be e£ + 3. Hence, we see that in this case, the cost of the best feasible tree is times larger 
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than that of an optimal solution allowing splits. And this "overpayment factor" could be significant: when 
i is large, this is 1 + -, and it grows unbound for small e. 

Note as well that any time that a network is operated close to capacity, it will be necessary to split flows. 
And that is a situation likely to be encountered often in power-constrained networks, since minimum energy 
designs will necessarily result in links being allocated the least amount of power needed to carry a given 
traffic load. Thus, we see that these examples above are not pathological cases of limited practical interest, 
but instead, they are good representatives of situations likely to be encountered often in practice. 

C. Suboptimality of Correlated Codes for Orthogonal Channels 

The key ingredient of the achievability proof presented by Cover, El Gamal and Salehi for the multiple 
access channel with correlated sources is the generation of random codes, whose codewords Xf are 
statistically dependent on the source sequences Uf [18]. This property, which is achieved by drawing 
the codewords according to Ylf=iP( x ij\ u ij) w i tn u ij anc ^ x ij denoting the j-fh element of Uf and X^, 
respectively, implies that Uf and Xf are jointly typical with high probability. Since the source sequences 
and U2 are correlated, the codewords X^(U^) and X2 (UJ?) are also correlated, and so we speak of 
correlated codes. This class of random codes, which is treated in more general terms in [21], can be viewed 
as joint source and channel codes that preserve the given correlation structure of the source sequences, based 
upon which the decoder can lower the probability of error. 

The class of correlated codes is of interest to us because of two main reasons: 

• From a practical point of view, correlated codes have a very strong appeal: sensor nodes with limited 
processing capabilities may be forced to use very simple codes that do not eliminate correlations between 
measurements prior to transmission [39] (e.g., a simple scalar quantizer and simple BPSK modulation). 

• From a theoretical point of view, since these codes yield the largest known admissibility region for the 
problem of communicating distributed sources over multiple-access channels, it would be interesting 
to know how these codes fare in our context, where we know separate source and channel coding to 
achieve optimality. 

Thus, specializing the achievability proof of [18] to the case of M independent channels, we get the following 
result. 

Corollary 2 (From Theorem 1 of [18]): A set of correlated sources \U\U2--Um] can be communicated 
reliably over independent channels (Xi,p(y\\x\), y{) . . . [Xu ,p(vm\xm) ^m) to a sink vq, if 

H(U s \U S c) <J2HXi;Y \U S e), 
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for all subsets S C {1, 2, ... , M}, S^%. 

Proof: This result can be obtained from the M-source version of the main theorem in [18], by 

specializing it to a multiple access channel with conditional probability distribution 

M 

p(y\x 1 x 2 ...x M ) = v{yiV2 ■ ■ ■ Vm\xix 2 . . . x M ) = Wv{yi\xi). 

i=X 

■ 

Part of the reason why we feel this is an interesting result is that the main theorem in [18] does not 
immediately specialize to Corollary 1: whereas the achievability results do coincide, [18] does not provide 
a converse. To illustrate this point better, we focus now on the case of M = 2: 

• In general, we have that/ {XiX 2 ;Y{Y 2 ) < I{Xi; Yi)+I(X 2 ; Y 2 ), for any p{uiu 2 xix 2 )p(yi\xi)p{y 2 \x 2 ); 
but for this upper bound on the sum-rate to be achieved, we must take p(u\u 2 x\x 2 ) = p{uiu 2 )p(xi)p(x 2 ) 
- that is, the codewords must be drawn independently of the source. And for this special case, our 
Theorem 1 does provide a converse. 

• As argued earlier, due to practical considerations it may not be feasible to remove correlations in 
the source before choosing channel codewords, in which case we face a situation where correlated 
codes are used, despite their obvious suboptimality. In this case, it is of interest to determine the 
rate losses resulting from the use of correlated codes, defined as Ai = I(X\\Y{) — I(X\\Y\\U 2 ), 
A 2 = I(X 2 ; Y 2 ) - I(X 2 ; Y 2 \Ut), and A = I(X\\Y\) + I(X 2 ; Y 2 ) - I{X 1 X 2 ;Y 1 Y 2 ). Straightforward 
manipulations show that Ai = I(Y"i; U 2 ), A 2 = I(Y 2 ; U\), and A = I(Yf, Y 2 ). 

• Since A, > 0, i G {0,1,2} (mutual information is always nonnegative), we conclude that the region of 
achievable rates given by Corollary 2 is contained in the region defined by Corollary 1. Furthermore, 
we find that the rate loss terms have a simple, intuitive interpretation: Ao is the loss in sum rate due to 
the dependencies between the outputs of different channels, and Ai (or A 2 ) represent the rate loss due 
to the dependencies between the outputs of channel 1 (or 2) and the source transmitted over channel 2 
(or 1). All these terms become zero if, instead of using correlated codes, we fix p{x\)p{x 2 ) and remove 
the correlation between the source blocks before transmission over the channels. 

At first glance, this observation may seem somewhat surprising, since the problem addressed by Corollary 1 
is a special case of the multiple access channel with correlated sources considered in [18], where it is shown 
that in the general case correlated codes outperform the concatenation of Slepian-Wolf codes (independent 
codewords) and optimal channel codes. The crucial difference between the two problems is the presence (or 
absence) of interference in the channel. Albeit somewhat informally, we can state that correlated codes are 
advantageous when the transmitted codewords are combined in the channel through interference, which is 
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obviously not the case in our problem. Practical code constructions built around this observation have been 
reported in [39]. 

IV. Conclusions 

A. Summary 

In this paper we have considered the problem of encoding a set of distributed correlated sources for delivery 
to a single data collector node over a network of DMCs. For this setup we were able to obtain single-letter 
information theoretic conditions that provide an exact characterization of the admissibility problem. Two 
important conclusions follow from the achievability proof: 

• Separate source/channel coding is optimal in any network with one sink in which interference is dealt 
with at the MAC layer by creating independent links among nodes. 

• In such networks, the properties of Shannon information are exactly identical to those of water in pipes 
- information is a flow. 

B. Discussion 

A few interesting observations follow from our results: 

• It is a well known fact that turning a multiple access channel into an array of orthogonal channels 
by using a suitable MAC protocol is a suboptimal strategy in general, in the sense that the set of 
rates that are achievable with orthogonal access is strictly contained in the Ahlswede-Liao capacity 
region [5, Ch. 14.3]. However, despite its inherent suboptimality, there are strong economic incentives 
for the deployment of networks based on such technologies, related to the low complexity and cost 
of existing solutions, as well as experience in the fabrication and operation of such systems. As a 
result, most existing standard implementations we are aware of (e.g., the IEEE 802.11 and 802.15.* 
families, or Bluetooth), are based on variants of protocols like TDMA/FDMA/CDMA or Aloha, that 
treat interference among users as noise or collisions, and deal with it by creating orthogonal links. We 
feel therefore that some of the interest in our results stems from the fact that they provide a thorough 
analysis for what we deem to be, with high likelihood, the vast majority of wireless communication 
networks to be deployed for the foreseeable future. 

• A basic question follows from the results in this paper: when exactly does Shannon information act like 
a classical flow in a network setup? In this paper, we showed that far more often than common wisdom 
would suggest: for any network made up of independent links and one sink, Shannon information is a 
flow. The assumption of independence among channels is crucial, since well known counterexamples 
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hold without it [18]. But, as argued before, far from being just some technical assumption needed for 
the theory to hold, independent channels arise naturally in practical applications. In establishing the 
flow properties of information, we showed how some well understood network flow tools can be applied 
to address network design problems that have traditionally been difficult to deal with using standard 
tools in network information theory, and we illustrated this with a simple example involving optimal 
routing. In particular we showed that, at least from an information theoretic point of view, there is 
little justification for the common practice of designing trees for collecting data picked up by a sensor 
network, thus opening up interesting problems of protocol design. 
• In retrospect, perhaps the results we prove in this paper should not have been surprising. In the context 
of two-terminal networks, we do know the following: 

- Feedback does not increase the capacity. Therefore, the capacity of individual links is unaffected 
by the ability of our codes to establish a conference mechanism among nodes. 

- Compression rates are not reduced by explicit cooperation, as it follows from the Slepian-Wolf 
theorem: the minimum rate required to communicate U\ to a decoder that has access to side- 
information Uq is H(U\\Uq), and knowledge of Uq does not reduce the rates needed for coding 
U\. Therefore, the amount of information that needs to flow through our network is not reduced 
either by the ability of nodes to establish conferences. 

Of course the statements above only hold for individual links, and a proof was needed to carry that 
intuition to the general network setup considered in this work. But those observations we think are the 
key to understanding why our results hold. 

C. Future Work 

After having established coding theorems for the problem of network information flow with correlated 
sources, a natural question that arises: what if, in a given scenario, TZ = 0? In that case, the best we can 
hope for is to reconstruct an approximation to the original source message — and the answer is given by 
rate-distortion theory [40]. The rate-distortion formulation of our problem in the case of non-cooperating 
encoders is equivalent to the well known (and still open) Multiterniinal Source Coding problem [12]. Our 
current efforts are focused on completing work on the rate/distortion problem, and on fully developing the 
ideas outlined in Section III-B (e.g., to deal with problems of the type considered in [41]). 
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Appendix 

A. Converse Proof for Theorem 1 

1) Preliminaries: Assume there exists a sequence of codes such that the decoder at vo is capable of 
producing a perfect reconstruction of blocks of N snapshots U = [U^U^ ...U^\, with P^ — > as 
N — > oo. Consider now decoding L blocks of N snapshots (indexed by I = 0...L — 1): 

• The 1-st block of snapshots (I = 0) is computed based on messages received by vq from all nodes 
Vi at times kN (k = 0...W-1). 

• The 2-nd block of snapshots (I = 1) is computed based on messages YJq received by vo from all nodes 
Vi at times kN (k = 1 ... W). 

• The L-th block of snapshots (I = L — 1) is computed based on messages Y-q received by vq from all 
nodes Vi at times kN (k = L-l ... W+(L-2)). 

Thus, we regard the network as a pipeline, in which "packets" (i.e., blocks of N source symbols injected 
by each source) take NW units of time to flow, and each source gets to inject L packets total. We are 
interested in the behavior of this pipeline in the regime of large L. 

For any fixed L, the probability of at least one of the L blocks being decoded in error is pJ iA ^ = 
1 — (1 — Pe ) L . Thus, from the existence of a code with low block probability of error we can infer the 
existence of codes for which the probability of error for the entire pipeline is low as well, by considering 
a large enough block length N. 

We begin with Fano's inequality. If there is a suitable code as defined in the problem statement, then we 
must have 

H^LNjjLN _ _ _ TjLNpLNrJLN _ _ _ T JLN ) < p(LN) {og ^LN x yLN x . . . x yLN ^ + h (p(LN)^ (U) 
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where h(-) denotes the binary entropy function, and £/,- = (U^ (I), ?7/ v (2), . . . , (/^(L)) denotes L blocks 
of N snapshots reconstructed at vq. For convenience, we define also 

It follows from eqn. (11) that 

tt / ttLN ttLN tt LN \ tt LN v BN v BN v BN\ 
I1{U 1 U 2 ■■■Um \ u r 10 r 20 • • • Y M0 ) 

W U/TtLN tt LN tt LN\ tt LN v BN v BN v BNf T LNf T LN T JLN\ 

— n(u 1 u 2 ■■■Um \u r 10 r 20 . . . r M0 v x u 2 ■■■Um ) 

< H{U^ N U^ N . . . U^ N \U^ N Ui N . . . Ujf) 

< LN5(P^), 

where Y? N = {Y$(1),Y$(2),...,Y$(B)) denotes B = W + (L - 1) blocks of N channel outputs 
observed by node Vj while communicating with node Vi, and (a) follows from the fact that the estimates U^ N , 
i = 1 . . . M, are functions of Uq and of the received channel outputs Y^ N , i = 1 . . . M. From the chain 
rule for entropy, from the fact that conditioning does not increase entropy, and for any S C M. = {0...M}, 
5 + 0, G S c , it follows that 

H{Ujj»\UFYilk.YiZ 8 .) < H(Uk N \U^Y^ Y s B ^ ) < LN5{P^). (12) 
Let the set of B codewords sent by the nodes in a subset A to the nodes in a subset D be 

X%» D = {X* N : i G A and j G £>}, 
and, likewise, the corresponding channel outputs be denoted as 

i^D = {Yg N : i G A and j G £>}. 
We will make use of the following lemmas. 

Lemma 1: Let Xs^s c be a set of channel inputs and Yg^S" be a set of channel outputs of an array of 
independent channels {Xij ,pij{y\x) ^y^} , \fi G S and Vj G 5" c . Then, 

l(X s ^s°;Ys^s*) < ^if'Yij). (13) 

Proof: Without loss of generality, assume that S = {1, . . . , rro} an d S c = {xo + 1, . . . , M}. From the 
definition of mutual information, it follows that 

I(X s ^s°;Y s ^s<) = H(Y s ^s°) - H(Y s ^s°\Xs^s<). 
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Expanding the first term on the right handside, we get 

E#0W) 

^ H (Yi^ Xo+ iYi^ Xo+2 . ■ .Yi 



< 



M) 



< E H ^ 



Similarly, the second term reduces to 

H(Ys^s c \Xs-^sc) 
= H(Y 1 ^ S "Y2^S" ■ ■ ■ Y Xo - t s<°\Xi-*S°X2-+s° ■ ■ ■ X Xo ^s°) 

Xp 

= H{Y 1 ^sAX 1 ^ Sa X 2 ^ s . . . . X Xo ^ s *) + E HiY^sAX^Xo^ ■ ■ ■ X Xa ^Y^ S o . . . Y^. 

i=2 

x 



H{Y l .+sAX 1 -+s°) + ^2 H ( Y ^sAXi^-) 

i=2 

Y,H(Yi^\Xi^) 

ieS 

H (Yi^ Xo+ {Yi^ Xa+ 2 ■ ■ ■ Yi^M\Xi^ Xo+ \Xi^ Xa+ 2 ■ ■ ■ X^m) 

E H <x 

\Xi ^x +lXi^ Xo +2 ■ ■ ■ Xi-+M) 

iGS ^ 

M N 

+ ^2 H (Yi^j\Xi^ XQ+ iXi^ XQ+ 2 ■ ■ ■ Xi^M)Yi^ Xa +i . . . Yi^j-x) j 

j=x +2 ' 

( M \ 

^(H(Y^ Xo+1 \X^ Xo+1 )+ ]T HiY^X^M 



ieS v j=x +2 

= E //sr <7 A '<7)- 
Combining the two expressions, we get 

J(X S _S«;1W) < E H{Y t3 ) - H{Y t3 \X t3 ) = J2 I{X 'rYj). 

thus proving the lemma. 

Lemma 2: U§ N -> (Ujj?YgX ge ) -* Y^ ge forms a Markov chain. 



October 2, 2005. DRAFT 



29 



Proof: We begin by expanding p{u^ N yg^ Sc yg^ Sc ) according to 

p{u L s N u L s ?y B s^yW->sc) = P{u L s N )-P{u L s?y B s^\u L s N ) -Piy^WruWy^)- 

To prove that Ug N can be removed from the last factor in the previous expression, we will use an induction 
argument on the length of the pipeline, L, and window sizes, K and T. 

Fix (S, S c ) and i,j € S c . Let L = K = T = 1. The encoding functions produce <7i?(f7/ v ) = X^a, which 
result in the channel outputs after transmission over the DMC between nodes i and j. In shorthand, 
we write 

„ (ttN^ _ Y N DMC jv 

Thus, the first block of channel inputs X S c'^g c generated in the node set S c depends only on source symbols 
Uh available in S c . Moreover, since the channels are DMCs, the channel outputs depend only on the 
channel inputs. Thus, we conclude that Ug— N and Yg^ Sc are independent given Ugc . 

Since we consider a pipeline of length L = 1, there are no more blocks to inject, but not all data may have 
arrived to destination, so we have to allow for a few (W, to be precise) extra transmissions. By "flushing 
the pipeline", we have 



„ f V l...N v l...N\ V N+1...2N dmc V N+1...2N 
9ij\ Y S^i Y S^i) — ^i^j *■ Y i^j 

It follows that 5^^g'c' 2Ar is independent of Ug" N given Yg^g c and U S a' N . Similarly, we have 

. v (W-2)N+l...(W-l)N v (W-2)N+l...(W-l)N^ Y (W-1)N+1...WN dmc v (W -l)N+l...W N 

from which we conclude that Yg^ s 1 ] N+1 '" WN is independent of Uh" N given yj^ c +1 " ^ N and 
Uha' N . Thus, for K = T = L = 1, and W arbitrary, 5 the Markov chain in the lemma holds (with 
B = L + W -I). 

To proceed with the inductive proof, we still take K = T = 1, (S, S c ) fixed, i,j € S c , but L is now 
arbitrary. By inductive hypothesis, we have the following Markov chain 

T AL-1)N , TT (L-l)N v (B-l)Ns V (B-1)N 

Encoding and transmission of the last block of each source yields 

, TT (L-l)N+l...LNv{L-l)N+l...LN v (L-l)N+l...LN-. V LN+1...(L+1)N dmc v LN+1...(L+1)N 

9ijK u i Y s-*i Y S--*i ) - A i->j — > Y i^j 

5 Since W is the delay used to allow data to flow to the destination, it would not be reasonable to perform induction on W for a 
given fixed network. Instead we take W as a parameter, which must be greater or equal to the diameter of the network. 
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such that for the last block, we have that 

JjLN _^ ^ u LNy( L + 1 ) N \ _> y(iH-l)JV 
S \ 5* c S 1 — >S C ' S° — >S C 

This is not yet the sought Markov chain, as we still need to flush the pipe. But similarly to how it was done 
for the base case of this inductive argument, we have that 

, v LN+l...{L+l)N„LN+l...(L+l)N^ V (L+1)N+1...(L+2)N dmc „(L+l)N+l...(L+2)N 

, V (B-2)N+1...(B-1)N V {B-2)N+1...(B-1)N^ V (B-1)N+1...BN dmc v (B-1)N+1...BN 



and therefore, now yes, we have that Y^ c ^ Sc is independent of Ug" given Y B _^ Sc and U^' N . 

The proof of the lemma is completed by performing the exact same induction steps on K and T as done 
on L. For brevity, those same steps are omitted from this proof. ■ 

2) Main Proof: We now take an arbitrary non-empty subset S ^ M = {0...M}, S 7^ 0, & S c . and 
start by bounding H(Ug N ) according to 

H{U L S N ) = l(Uk N ;U^Y s ™.Y*Zs<) + H(U^\U^Y B ^Y B Z S .) 
< l{u£ N ;U^ N Y s B » Se Y s B Z s °) + LN5(PW) 

= I{U L S N -U^) + I{U L s N -Y s B ^\U L s?) + I{U L s N -YiZsAU^Y^) + LNS(P^), 

where (a) follows from (12). From Lemma 2, we have that I(U^ N \Y^ Se \U^Y^ Se ) = 0, and so we 
get 

H(Ujj N ) < I(U^ N ;U^ N ) + I(U£ N ;Y^ Sc \U^ N ) + LN5(P^). (14) 
Developing the second term on the right handside yields: 

bn 



= Y, I ^s N ^s^(k)\U^Y s k Z 1 s .) 

k=l 

BN BN 

k=l k=l 
BN 

= £ 1(X S ^ (k)U§ N ; Y 8 -b- (k)\U^Y^ Sc ) 
k=i 

BN 



k=l 
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BN 



(6) 



Y,I(Xs-,s<k);Y s ^(k)\U^ N Y s k ~ 1 Sc ) 

k=l 

BN 

^HiYs^kW^Y^) - H(Y s ^(k)\U^ N Y s k ^Xs^(k)) 

k=l 
BN 

J^HiYs^mU^Y^) - H(Y s ^(k)\Xs^(k)) 



k=l 
BN 



< J2H(Y s ^ S o(k)) - H(Y S ->s<k)\X s _> S c(k)) 

k=l 



BN 



Y,I(Xs^s*(k);Y s ^(k)) 



k=l 
(d) BN 

^ E E HXiiWiWk)) 

k=\ ieS,jeS c 

BN 

= E E J tM*);W) 

i£S,j€Sc k=l 

< y BNC v 

ieS,jeS c 

where we use the following arguments: 

(a) given the channel inputs Xs^S e (i) the channel outputs Yg-tS c (i) are independent of all other random 
variables; 

(b) same as (a); 

(c) conditioning does not increase the entropy; 

(d) direct application of lemma 1. 
Substituting in (14) yields 

H(Ujf N ) < I(U§ N ;U^ N )+ Y BNCij + LN5(P^ LN ^). 

ies,jes° 

Using the fact that the sources are drawn i.i.d., this last expression can be rewritten as 

LNH(U S ) < LNI(U s ;U S c)+ Y BNC %j + LN5{P { e LN) ), 

ieS,jeS c 

or equivalently, 

H(U a \U S .) < j E C lJ+ 5{P^) < ( ^ + L L - 1} Y C ij + 6(PW) 



October 2, 2005. 



DRAFT 



32 



Finally, we observe that this inequality holds for all finite values of L. Thus, it must also be the case that 
H(U s \U Se ) < inf {W + L " 1} £ C t3+ 5(P^)) 

ies,jes c 

But since c)(pj iA ^) goes to zero as — > 0, we get 

H(U S \U S e) < C H> 

ieS,j€S" 

thus concluding the proof. ■ 
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